God Components project Statistics

Loading datasets

Import dependencies.

Load commit data.

Dispose some columns we don't need.

Make sure the date rows are indeed parsed as a datetime.

Load the aggregated report file, all_reports.csv. Contains Designite report data for every single Tika commit.

Dispose some columns.

Add commits to report data, combining them into one big dataset, gcdata.

God component lifetime

General statistics on lifetime.

Average over the above stats

Total amount of God Components

Amount of classes per God Component

Computing the # classes chronological difference (delta)

Save a small version of is gc metric the # classes dataframe only where abstract difference > 3

God Component growth in terms of Lines Of Code (LOC)

Load up data on Lines Of Code for every God Component at the state of every commit.

Add commit datetime.

Investigating what developers contribute to GC buildup

How many developers contributed to God Components? We aim to answer the question in terms of both God Component (1) buildup and (2) refactoring. We do this, by considering the # classes added and # classes removed for each developer.

Also, show that there are not a lot of developers working on the entire project at all. Developers versus LOC's added/removed:

Jira Issues analysis

Add the Tika Jira issue tracker information as a data source.

Keep only issue key and issue type columns.

Generally, how many and what types of issues are in Apache Tika's Jira issue tracker?

... of which these amounts are involved in God Component commits:

... which is this percentage:

We can also check what issue types are represented most in the God Component commits:

Build a pivot table and show heatmap.

Issue types that contribute to GC buildup

This time around, only include those commits that actually 'build up' or 'decrease' the size of a God Component; i.e. those commits that actually: add or remove classes to a GC.